Using Wikipedia-based conceptual contexts to calculate document similarity

title	Using Wikipedia-based conceptual contexts to calculate document similarity
creator	Kaiser, Fabian
	Schwarz, Holger
	Jakob, Mihály
date	2009-02
language	eng

identifier


description	Rating the similarity of two or more text documents is an essential task in information retrieval. For example, document similarity can be used to rank search engine results, cluster documents according to topics etc. A major challenge in calculating document similarity originates from the fact that two documents can have the same topic or even mean the same, while they use different wording to describe the content. A sophisticated algorithm therefore will not directly operate on the texts but will have to find a more abstract representation that captures the texts' meaning. In this paper, we propose a novel approach for calculating the similarity of text documents. It builds on conceptual contexts that are derived from content and structure of the Wikipedia hypertext corpus.
publisher	Cancun, Mexico: IEEE Computer Society
type	Text
	Article in Proceedings
source	In: ICDS2009: Proceedings of the 3rd International Conference on Digital Society, pp. 322-327
contributor	IPVS, Anwendersoftware
subject	Information Storage and Retrieval (CR H.3)
	Information Search and Retrieval (CR H.3.3)